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Abstract By using degenerate oligonucleotides based on 
the sequence homology between known MutS homo- 
logies, three A*fSH cDNAs belonging lo the MSH2 t 
MSH3 and MSN6 families, as defined in eukaryotes. 
have been isolated from Arabidopsis thalianci (ecotype 
Columbia). Genomic sequences for two of these genes 
{A {A4SH2 and AtMSH6-2) were also isolated and de- 
termined, whereas the genomic sequence of At M Si J 3 
was obtained through the Arabidopsis sequencing pro- 
ject, as was the sequence of a stscond, distinct At MS HA 
homologue (AtMSH6-]) t Comparative analysis of the 
AtAfSHZ Landsberg erectct genomic sequence (reported 
here) and the previously described AiMSHl Columbia 
allele revealed several polymorphisms, including the 
presence of a small, transposon-like element in the 
y untrunscribed region of the former allele. Arabidopsis 
is the first organism to show such divergence of twcl 
AtMSH6 genes; the divergence is strongly supported by 
sequence data and phylogenetic analysis. Southern 
analysis revealed that the three genes we have isolated 
exist as single copies, and genetic mapping indicated 
that AtMSU2 and AtMSH6-2 both reside on chromo- 
some ITT, Finally, expression of these three genes could 
only be observed in suspensions of A, thuliana cells. 
Such a cell suspension divides actively after subcukure, 
and Ihe At MSN genes are most strongly expressed at 
this stage. 
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Introductiun 

The mismatch repair system (MMR) is essential for 
genetic sS lability, as it removes newly arising mutations 
from ihe genome and regulates recombination between 
related DNA sequences. MMR is responsible for the 
recognition mid processing of mispaired bases that are 
spontaneously produced in the DNA as a consequence 
of replication errors, genetic recombination or deami- 
naiion or 5-Me cytosines (Kolodner 1996). Repair of 
replication errors contributes to the conservation of the 
original information carried by llie DMA. On the other 
hand, the regulation of the length of recombination in- 
termediates O ieteT "°duplexes) and whether or not they 
are edited by the MMR system define the degree to 
which recombination may occur between homologous 
but non-identical DNA sequences (Vulic el ah 1997). 
Recognition of mispaired bases in the heteroduplex re- 
gion triggers the abortion of recombination events nnd 
prevents rearrangements between DNA sequences that 
are loo divergent (Raysssiguier et al. 1989). 

The MutHLS mismatch repair system in Escherichia 
cod is by far the besi characterized. It derives its name 
from the Lhrce genes required to initiate MMR (MutH, 
MutL, MutS), The MutS protein is responsible for the 
detection of mismatches, and on bindinfl it determines 
further processing and repair of mismatch-containing 
DNA molecules by the other components of the MMR. 
The MutH protein can recognize ihe newly replicated 
DNA strand, as it is transiently undcrmethylated at 
adenines in GATC sequences. Association of the MutL 
protein with MutS bound to the mismatch stimulates 
endouuclcolytic cleavage of the unmelhylated GATC 
sequence by MutH. Exonucleolylic degradation then 
proceeds to remove a stretch of up to 1000 bases around 
the mismatched base, followed by gap-repair synthesis 
and tigation of the correct DNA sequence (Modrich and 
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Latnie 1996). In the ab^rice of a functional MMR sys^ 
tern bacteria show mutator or enhanced recombination 
proficiency phenotypes {Cox 1976; Feinslein and Low 
19S6; Rayssieuicr et al. 1089). 

Our current knowledge suggests that the general fea- 
tures of the bacterial MMR seem to be rather welt con- 
served across all living organisms. Thus, mismatch repmr 
activity or mismatch repair genes have been detected in a 
wide variety of eukaryotes (Modrich and Lahue 1996). 
Mismatch repair can be assayed following transfection of 
artificially constructed heteroduplex DNAs into yeast 
(Bishop et al. 1989; Kramer ci al. 1989) and mammalian 
cells (Folger ct al. !9H5; Hare and Taylor 19K5; Brow 
and Jiricny 1988), or after incubation in cell-free extracts 
from DroxophUa (Holmes ct ah 1990; Bhui-Kaur el al. 
1998), Xenopux (Varlct et al. 1996) or human (Holmes 
ct al. 1990; Thomas el al. 1991). This mismatch repair 
activity is abolished in al! available mutants that are 
deficient for the mismatch repair functions (Parsons 
et al. 1993; Umar et al. 1994; Luhr el al. 199K) Finally, 
homologues of MuiS and MutL have been isolated from 
eukaryotes, biU ihcir number suggests a higher level of 
complexity of the MMR system, or the involvement of 
more specialized processes. Whereas in bacteria the 
MutS proteins seem to belong to two different lineages 
(MuiS-I and MulS-H as dclincd by Eisen 1998), which 
are not necessarily both present in every bacterial species, 
gene duplication and functional specialization have led 
to the divergence of many MutS homologies in eu- 
kaiyoicss. Six MutS homologues coexist in yea-st: Msh2, 
Msh3 and Msh6 arc involved in nuclear MMR, Msh4 
and MshS participate in mciotic recombination and fi- 
nally Mshl is involved in MMR in the mitochondria 
(Reenan and Kolodner 1992b; Ross-Macdonald and 
Roeder 1994; HollingswortW et al 1995; Marsischky 
et at. 1996). Msh2 p Msh3 and Msh6 homologues have 
since been found in many dilfcrent organisms, including 
mammals, Drosophila, Neutospora and Arubidopsis (for 
review, see Kolodner 1996). These uukaryolic MSH 
genes are classified as belonging to the MutS-1 {MSHL 
MSH2. MSH3 and MSJ46) or the MutS-lT lineage 
(MSH4 and MSHS) (Eiaen 1998). 

According to the current model for mismatch repair 
in eukaryotes Msh2 interacts with either Msli3 or _Msh6 
to form complexes with different recognition specifies; 
MsU2/3 complexes show a greater affinity for small in- 
sertions/deletions and Msh2/6 for single base mis- 
matches (Kolodner 1996; Marsischky et al- 1996). This 
model is clearly supported by genetic and biochemical 
data (Recnan and Kolodner 1992b; Drummond et aK 
1995; Palombo ct al. 1995; Acharya ct al. 1996; Mars- 
ischky ct al. 1996; Genschel et al- 1998). Yeast msh2 
mutants exhibit a mutator phenotype reminiscent of the 
niutS phenotype in bacteria, as do the nnh3 tnxh6 
double mutants (Marsischky ex al. 1996). Tn humans, u 
is now well established that MMR deficiencies can iead 
to some hereditary cancer predisposition syndromes 
(Modrich and Lahue 1996). These cancers are associ^ 
aled with genetic instability, a phenotype that can be 



detected as an increased mutation rate in reporter genes 
or in tracts of short repeated DNA sequences. Such 
microsatellite instabilities presumably result from slip- 
page of the replication machinery, which generates short 
insertions/deletions that would normally be recognized 
and repaired by the MMR. They arc specifically ob- 
served in m.v/i-2, mM or msh6 tumor cells and in indi- 
viduals with germline mutations in these genes 
(Modrich and Lahue 1996; Risinger ct al. 1996; Akj- 
yania d al. 1997; Miyaki et al. 1997). Genetic variabil- 
ity and cancer susceptibility are also dramatically 
increased in mice carrying null mutations in the MSH2 
or MSH6 gene (de Wind et al. 1995; Rdimair et al. 
1995; Edelmann et al. 1997). 

AS well as its role in surveillance of replication fidelity, 
the MMR is also involved in regulating genetic recom- 
bination between homologous but non-identical DNA 
sequences (Rayssiguter et al- 1989). If the outcome of 
recombination depends on the formation of a hetero- 
duplex intermediate, the presence of mismatches in the 
heteroduplex makes it an obvious target for the MMR. A 
functional MMR system acting upon the mismatches can 
destabilize the heteroduplex, thus impeding recombina- 
tion between homeologous DNA sequences. Studies in 
bacteria, yeast and mouse cells have all shown that mu- 
tations that affect components of the MMR can mark- 
edly increase the amount of recombination between 
divergent DNA sequences (Rayssiguier et al. 1989; Selva 
ct al. 1995; de Wind et al, 1995; Datta et al. 1997). 

Not much is known about mismatch repair in plants. 
Plant cell extracts from pea can repair mismatched 
oligonucleotides (Cerovie et al, 1991) and an MSH2 
homologue was recently isolated from Arabidopsis 
ihaimna (Culligan and Hays 1997). With the aim of 
gaining insight into the role and activity of mismatch 
repair in plants, we have isolated homologues of MSH2, 
MSHS and MSH6 from /I, thaliana. Here, we provide a 
detailed characterization of these genes, which are ex- 
pressed at detectable levels only in a rnitoiieally active 
cell suspension derived from Arabidopfis. 



Materials and methods 

Growth of cell kus pension 

The cell suspension (ecolype Columbia) was initiuted by Axelos 
ci nl. (1992) und is continuously propitiated by weekly subculture 
(1.5 ml/25 ml) in Gfimbore's basal medium K3-5B93 Sigma). 
30 g/l sucrose, 200 mg/l naphthalene acclic ucid. The cell aunpen- 
iion i» grown under agitation in u growth chamber With a i 16 h 
phoiopcriod. Hnrvestcd plmii ttih Lentil was M^reo nt -70 L neiore 
extraction of RNA. 

rMA isolation und Northern unuW^s 

Totn! RNA was extruded from the cell suspension ifi the P«-J B JJf e 
of Truol (Gibco BRL) utter homogenizing the c*U« *!'J U, ^. N -: 
Poly(A) + RNA was moated using iho Dynubcnds ^IS* Direct 
kit (Dyw.1). Poly(A) + RNA w* s fractionated in VCgcU 
formaldehyde gels ufier denaturauon (SHmbrook et al. 19B9). Ocl* 
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were transferred onto Nylon Hyhond N f ' membranes (Amershum) 
by capillary Walling. After hybridization to radiol libelled prohes, 
the filters were washed in 0.1 x SCC. 0.f% SDS at 62°C and au- 
to radio graphed. 



Genomic DNA isolation and Southern analysis 

Genomic DNA was extracted from the cell suspension according 10 
DellapcrU ct h1. [1983). Enzymatic digestion mid electrophoresis of 
DNA was done using standurd techniques. DNA, was transferred 
opto Nylon HybontJ N + membranes (Amersham) by capillary 
blotting- Genomic DNA sequences were isolated from li previously 
constructed San3Ai partial genomic library (Douiriatw ct al. 



Rudiola belled probes 

Radiolnbelling of the probes with - 12 P w.u carried out with the 
SLnitaflcnc Prime It U kit. Hybridization with "p.rndiolabdled 
probes corresponding lu the complete coding region* of the gene for 
translation elongation factor EF-t alpha from bean (pCHA0Q4l; 
Axelos el al. 1989). AiRADSi (Douiriuux et u\. und the 2RS 

ribosomHl RNA gene (Arabidopsisi Rjologictri Resource Center) 
were performed at 62° C according to Church and Gilbert (1984). 



Reverse transcription and PCR 

A itliquot or total RNA wan reverse transcribed using lite 
MMLV reverse transcriptase by priming with random oligonucle- 
otides in the pretence of dNTPs. Using two different &gU of 
degenerate oligonucleo tides (cnc]\ primer 41 J uM), PCR was pcr- 
formcd using ft rat-strand cDNA or genomic DNA in a final volume 
of 100 pl ? iti the presence ordNTP* (0.2 n»M), 1 x PCR buffer und 
Tuq polymerase (2 U). PCR puraipetcrti Tor oligonucleotides of set 1 
(touchdown PCR), were' three rounds or three cycles each (94" C 
for I min; 45° C, 41* C und 37° C for 2 min each, and 72° C Tor 
I min), followed by 35 cycle* of 94° C for 30 k, 48<*C Tor 30 s, und 
72°C for 30 s, with a final fitep for 10 min u.1 72"C- For sct2 
oligonucleotides. PCR was carried am nt 94° C for 5 min, followed 
by 30 cycles of 95" C for 40 s, 45 C C for t min, and 72°C for ( min. 
The amplification products Ar23 imd At24 (obtained wiili sell), 
and S5 and SS (sci2) were subejoned and sequenced. Of these 
clones, Ai24 £654 bp, derived from genomic amplification) was 
homologous to MSH2 4 S5 (351 bp) was homologous io MSIIS, 
and Al23 (623 bp) and SH (351 bp) were identical (excepi for the 
presence or iatrons in Ai23 - which was amplified from genomic 
DNA), and homologous to AfSH6. 

Isolation of At\fSH2 cDNA 

To obtain a cDNA clone for AiMSH2, ten pool* of 10,000 clones 
each from library CD4-16 (ecoiype Columbia, provided by the 
AmbidopBte Biologicnl Resource CcnUr) wore plalcd on 15-cm 
pctri dishes. The amplified phages were collected in 3 ml of SM 
buffer (10 mM NaCl, 1 mM MgSQWH^O. 50 mM TR1S-HC1 pH 
7.5, 2% gelatin), Df which I pi was used to perform PCR with the 
primers MSH2-1 und MSH2-2 are specific Tor AtMSH2. One 
of the positive pools was used to generate icn pools of 1,000 clones 
ciiclv PCR was used to identify positive pooln of 1 ,GO0 phages from 
which two replica le Titers were |ifie*J. Two positive plaques were 
identified fallowing hybridi/mion with the At24 insert und in vivo 
excision of the insert (StraUmene) wuk used lo obtain a pinsmid 
version of one of the cloned cDNAs. 



Isolation of the AiMSH3 and AfMSH6 cDNA sequences 

Complete cDNA sequenced wore isolated using the procedure 
supplied with the Muruihon cDNA *implificution kit (Clontcch). 
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Briefly, double-stranded cDNA was produced by reverse tran- 
scription of 2 pg of poly(A) 1 RNA from the cell sttbpension culture 
of Ar£ibidopsb\ Adaptors were ligatcd an each $idc of the cDNA. 
The liguied cDNA was then used m a template for 5' and 3' RACE 
PCR reactions in the presence of primers speeillc for the adaptor on 
one side (API and AP2), and specific for the targeted gene on the 
other side (see below), as defined from the previoubly isolated 
consensus regions S5 und S8. A 5' and it 3' fragment that overlap 
were thus produced for eiich gene. 



Isolation of the complete coding sequence of AtMSfiS 

PCR performed on the I (gated cDNA with primers 636 and API for 
the 5' RACE PCR was followed by a second round of umpthictilion 
with the nested primers AP2 und S525. which resulted in h 2720-bp 
DNA fragment. Another primer (S51) whs designed that anneals 
closer to the 5' end and permitted Hits determination of 99 bp up- 
slream of the ATG initiation codon, For the 3' RACH PCR, a firai 
l*CR reaction was performed with primers API and 635, followed 
by a second round of amplification using the nested primers AP2 
and 5523, which produced n DNA fru^ent of B90 bp. Both DNA 
fragments were subclone*! into pGEM-T und sequoneed Since PCR 
amplification using the Expand tong Template PCR system 
(Bochrinser-Mannheim) produced errors in the sequence, new 
oligonucleotides were designed to rc-isolftte these sequences by 
PCR, using the liigh-fidefity DNA polymerase Pfu. PCR with 
primers I Si and S53 amplified a ]244-bp fragment (cloned into 
pUC18/Smal). PCR with primers S52 and 2S5 amplified n 2104-bp 
fragmcni (cloned into pLTClS/Smal). These rwo clones were tigatcd 
after digestion with BamW, a unique site for which it, present in ihe 
y^erlappinfc resion. The complete reconsiituied AtMSm coding 
sequence is 3246 bp long. 



TsolnT/on of the complete coding sequence of AiMSH5-2 

The same procedure allowed tbe isolation of the 
cDNA. For ihc 5' RACE PCR, primers 636 tvnd API allowed the 
amplification of a 2RS9-bp DNA fragment primer SSI helped do- 
fine the 142 bp upstream of the ATG initiation codon. On the 
3' side, RACE PCR wtis initially performed with primers S&23 and 
API, and then with the nested primers 637 imd AP2, to produce h 
774-bp DNA fragment. As Tor AtMSlF3 r these fragments were 
cloned and Mbqucnccd- Due to PCR errors, rc-isolmion of this 
DNA sequence using the high-fidelity Pfu polymerase rtnd the 
newly designed primers IS8 and S83 (ror the 5' side, 2162-bp clone 
43 in pLJCiS/Smal), and primers S82 nnd 2S8 (for (he 3' side, 137^- 
bp clone 62 In pUC18/Srm»l) wun carried qui. Clones 43 ttnd 62 
wore digested with Y/wi), for which q unique sjlc is present in the 
overlapping region, and liguted. The complete reconstituted At- 
coding sequence is 3330 bp long. An AlMSH6-2 genomic 
sequence was ttlso isolated from a genomic DNA library con- 
structed from a partial >ftw3Al digest of DNA from the Arabtdopsis 
celt suspension. A stretch of H062 bp wns sequenced lhat included 
the entire AiMSN6-2 gene, which whs precisely colinertr with the 
cDNA. 



GWi>oi\uclcoiides 

Two sets of degenerate MSf{ primers were used. Set! comprised 
MMR1 [5'-CGTGGATCCTCACTGGICCNAA(C/T)ATGGG-3'l 
tind MMR2 (5'-QGTGAATTCGTGGAA(A/G)TGIGTNqC(A/ 
G)AA-3*),- Set2 (tts in Rccnan and Kolodner 1992») consisted of 
MMR3 [5'-CTGaATCCACfGGlCCTAA(C/T)ATG-3'] and MM- 
R4 [5'-CTGGATCC(A/G)TA(A/G)TOiGTICA/G)C(A/G)AA-3'J. 

As AtM$H2 specific primcrfi, MSH2-1 (5'-TCCACTTACAT- 
CCGCCAGGTTGATG-3'), MSH2-2 (5'-ATGCTCACATATA- 
GCCCAAGCTAAACC-30 and MSH2-3 (5'-AAACTTGTGA- 
GCTCGCTCTGCCCC-3') were u.sed 
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The AiMSH3-npzc{fc primers wore 135 (F-ATCCCGG- 
CATGGGCAAOCAAAAGCAGCAGACGA-3'), 2S5 (5'-ATC- 
CCGGGTCAAAATGAACAAGTTGGTrfTAGTC-30; S53 (5- 
GACAAAGAGCGAAATGAGGCCCCTTGGO^, S52 (S r -GCC- 
-VCATCTGACTGTTCAAGCCCTCGC-3'^ S51 (S'-QOATCC- 
GGTAClGGGTTTTCiAGTGTGAGG-n: S525 (S'-AGGTTCT- 
GATTATGTGTGACGCTTTACTTA-n, S523 (5'-TCAGACA- 
GTATCCAGCATGGCAGAAGTA-3'), 635 (5'-GCACGTGCT- 
TGATGGTG1-TTTCAC-3') »nd 636 <S'-TGCTAGTGCCTC- 
TTGCAAGCTCAT-3'). ■ . 

The AiMSW-2 specific prime™ us ed we re; 138 (5 -ATCCC- 
GGGATQCAGCGCCAGAOATCGATTTTGT-y). 2SK <5'-AT- 
CCCGGGTTATTTGGGAACACAGTAAGAGGATT-3'), S82 
(5'-GCGTTCGATCATCAGCCTCTGTGTTGC-3') T SH3 (5'- 
CGCTATCTATGQCTGCTTCGAATTGAG-3'), Sb I (5'-CGT- 
CGCCTTTAGCATCCCCTTCCTTCAC-3'); 6.17 (5'-GACAGO 
GTCAGTTCTTCAGAATGC-3'), 63B (S'-TCTCTACCAGGT- 
GACGAAAAACCG-3') and SB23 (S'-GCTTG GCGCATCTA A- 
TAGAATCATG A CAGG-30- 

Genctic mapping of AiM$H3 4iuJ 

Primers MSH2-I and MS1-T2-3 were used to Amplify a 13-kb seg- 
ment of A1MSH2 from the ecotypes Ltindsbcru erecta and Co- 
lumbia. A polymorphic Kfbol site was identified by sequence 
analysis hti4 used to score 96 Recombinant Inbred (Rl) lines re- 
sulting from h cross hetween Lmidsbcrg wvia iind Columbia 
[Lister ;*nd Dean 1993). For AtMSH6> * RFLP between these two 
cqo types was observed Hollowing digestion of genomic ON A with 
tffrfdTTI and liybri4izntion with :i PGR product of 2 kl>. 1 his 



Phyloflcneuc analyses 

Alignment of Ihe sequence was curried out visually with the help 
or the ED program in The MUST p:ick»p version 1.0 (Philippe 
19SH) Phylogcnetic trecri were constructed mine maximum likeli- 
hood (ML), maximum parsimony (MP) and disUvnce-bascd meth- 
ods (Neighbor Joining. NJ) wilh the programs PROTML version 
2.3 (AdHChi and Hasegawa 1996), PAUf version 3.1 (Swofford 
1993) and N.T in the MUST package version 1.0 (Philippe 1993), 
respectively The distances were computed with the substitution 
model of Kimura (1983). MP trees were obtained by J 00 nindom- 
addition heuristic search replicates, unit ML tree* by the Qutck; Adtf 
OTUs search, with the JTT model of amino acid substitution and 
returning the 500 top-ranking trees (options -jf -q -n 500]- Since it » 
important to take among-site rate variation into account m infer- 
ring pliylaeeny (Y»n£ 1996), thesis 500 trfies were further »na1y«rt 
with i he PUZZLE program (Sirimmcr and von TT^eselcr 1996) as 
uuer trees wilh eifcht Gamma rale categoric*. Bootstrap proportions 
were calculated by the analysis of IO00 replicates for MP and N,l 
analysis. For ML analysis, booistnip proportions were computed 
by ufiing the RrILL method (Kibhiiio and Hasegu wh 1 989) owing to 
limitations on computing time, 



Results 

Isolation of the AtAfH2, AtMSHl and AtMSH6-2 
cDNAs 

Based upon a comparison of conserved amino acid se- 
quences in known MutS-related proteins Prom various 
species, a set (setl) of degenerate oligonucleotides was 
designed; the second set (set2) used has been described 
previously (Rcenan and Kolodner .992q). PCR ampli- 



fications were performed using either Arubidopsh (eco- 
type Columbia) genomic DNA or first-strand cDNA as 
a template. Tins allowed the isolation of consensus re- 
gions for three potential homologies of mutS. At24 
(654 hp), At23 (623 bp), S5 (351 bp) and S8 (351 bp) 
were cloned, and sequence analysis indicated that they 
were homologous, respectively, to MSH2 (Al24), MSH3 
(S5) and MSH6 (At23, 58), three of the MSN genes 
previously described in yeast. After designing oligonu- 
cleotides specific Tor the genes of interest, two different 
approaches were taken in order to isolate their complete 
cDMA sequences. AtMSH2 was isolated from a cDNA 
library, after successive rounds of selection of positive 
clones by PCR; AlMSHS and AtMSH6-2 were isolated 
following the Marathon cDNA amplification procedure, 
which relies on 5' and 3' RACE-PCR 

The AtMSH2 cDNA clone is 3039 bp long, and 
contains an QRF of 2&1 1 nucleotides which Is identical to 
that reported recently by Culligan and Hays (1997). The 
predicted protein is 937 amino acids long, with a pre- 
dicted molecular weight of 105.5 kDa. The reconstituted 
At MSB 3 sequence is 3553 bp long and contains a 3246- 
bp ORF wilh untranslated regions of 99 bp (50 and 
144 bp (30 (EMBL/Cenbank Accession No, AJ007791). 
The cDNA encoded 3 putative protein of 108) amino 
acids, with a predicted 1 molecular weight of 117.fi kDa. 
The AtMSH6-2 sequence is 3701 bp long and contains 
an ORF encoding 1109 amino acids (predicted molecular 
weight 122.5 kDa); its coding region starts 141 bp from 
the 5' end and the polyA tail starts 106 bp downstream 
from the TAA stop codon (EMBL/Geubank Accession 
Mo. A.T007792). A short sequence (351 bp) that is iden- 
tical to the AxMSH6-2 consensus region has previously 
been described by CulUgnn and Hoys (1997). 

In the predicted protein sequences, the typical Msh 
functional domains can be found at the C terminal end 
(Fig. 1). tike other members of the MutS family, At- 
Msh2, AtMsh3 and AtMsh6-2 have the four motifs (A- 
D; see Fig. 1) characteristic of an NTP-binding domain, 
as defined by Gorbalenya and Koonin (1990) for the 
suporfamily of UvrA-related proteins. The second con- 
served domain, containing the residues essential for the 
formation of the Helix-Turrv Helix structure (HTH; see 
Fig- 1) is also present in the Arabicfapfis Msh proteins 
(Ohlendorf et al. 1983). 

Genomic clones were isolated for both AtMSH2 and 
A/MSH6-2. The AiM$tf2 genomic clone which we re- 
port here was isolated from the ecotype Landsberg erecta 
(GenBank Accession No. AF109243) and it shows sev- 
eral differences from the previously reported genomic 
clone of the Columbia allele (Culligan and Hays 1997; 
GenBank Accession No. AF003005). While the number 
and position of all 12 introns are conserved in both al- 
leles, numerous polymorphisms are seen both in the 
coding and non-coding regions (see Fig. 2). Within the 
13 exona, a total of 1 1 single-base substitutions was ob- 
served, of which six are neutral and five lead to a change 
in the amino acid sequence. None of these changes occurs 
at a position which is conserved among the cukaryotic 
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tions 

MSl/2 genes The most striking difference between the 
two alleles, however, is a 239-bp insertion located 196 bp 
after Lhe Slop codon in the 3' un transcribed region of the 
Landsberg erecta allele. Thia insertion is flanked by a 
direct duplication of 5 bp and bears many but not all of 
the features of a miniature inverted-repeat transposable 
element (MITE), a class of small transposable elements 
recently reported in plants (Bureau and Wessler 1994b). 
This element differs from the Emigrant element, the only 
M (TE reported to date in Arabidopxix (Casaeuberta ct ah 
1 90S), and will be described in dcLail elsewhere (J. Ade 
und F. Belzile, unpublished). 

A 8062-bp genomic region (Columbia ceotype) that 
encompasses the AtMSH6-2 gene was also denned and 
revealed the presence of 16 introns within the sequenced 
region (EMBL/Gcnbank Accession No. AJ007792). The 
sequence of the genomic region of AtMSHS has been 
completed recently in the course of the Arabidopuis sc- 
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quencing project. The U exons of this gene are found 
within a stretch of the BAC clone M7J2 (Gen- 

Bank Accession No. AL022197). Soulhem analysis of 
restriction digests of genomic Arahicfopsis DNA with 
prohes corresponding to the genomic consensus regions 
for the genes AiMSHl and A/MSH6-2 indicates that 
they are single^copy genes nnd do not crois-hybridize (see 
Fig. 3). The sizes of the detected fragments always cor- 
related exactly with their expected sizes, whenever these 
could be determined from available sequence informa- 
tion. Surprisingly, a fourth MSH gene was encountered 
in the course of the Arabuhpfis genome sequencing 
project (ID ATAF1308, product name T10MU.8). Se- 
quence comparisons indictite that this gene is related to 



Fig- 2 Polymorph 51 "* between the Umdsbcrg erecta and Cohirvibm 
alleles of AtMSH2. In ihis din tram, exons yro shown us open 
rectangles* whereas introns arc draw* as \^sh<\pt>d iwes between exons 
-flic position* or the start (ATCr) und stop (TCA) code*™ »s well an 
thai of each of the 1 1 polymorphism* (nil single hi*M lubslitutions) 
located within the coding region arc indicated uhove the gene. 
Pcisiiion I refers to the first base of the genomic sequence arilns allele 
(GenBunk uc^ession AF 109243). Substitutions ih:U tend to a change 
in »mino acid dequence ure indicntcd by astcrixks. The nuture (number 
of ba.se subsifrutions or length of in scriioi Volution) of the polyTrtcir- 
phisms located in introns is indicated tefow the tlmfifari. A 23y-bp 
miniature in vcrLed- repeat Irwisposnble elemcnl (MJTE-like) insertion 
{hashed bfs.*) flanked by a 5-bp duplication (-) at the inacrtion site 
was found in Lhe 3' region of the tundsborg erecta allele only 
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AtMSH6-2 



AtNfSH2 t a CAPS marker was developed based on a. 
polymorphic Mbo\ site present in Landsberg erecia 
DNA but absent from Columbia DNA, This marker 
was used to follow the segregation of this locus among a 
population of 96 recombinant inbred lines (L. erecta x 
Columbia). AtMSH2 was found to reside on the top arm 
of Arahidopsis chromosome III. J.l cM from locus 
ml 05. For ArMSH6-2, a RFLP mapping approach was 
used on a subset of 24 RI lines and ihis indicated that 
A\MSH6-2 was also located on chromosome III and 
cosegrcgates with AB11. For AiMSH3 } both CAPS and 
RFLP approaches were unsuccessful, owing to a lack of 
detectable polymorphism between the mapping eco- 
types. However, the location of the recently sequenced 
BAC clone (M7J2) containing the AtMSH3 gene indi- 
cates ihat it maps on the top of chromosome IV (closest 
marker PG19). A1MSH6-J also resides on chromosome 
IV, based on the mapping of BAC clone T10M13 
(closest marker GT148). 



Fie. 3 Southern tvrmlysis or llic genomic AtMSfH and AfMSHf-2 
lu£ Totiil Awhitbpsh DNA from the Aftihidapxi* coll su^nsion 
culture was digtwUsd with (B). Bg!\\ (Bfi), EcoR\ <E), Wnd\U 

(ID Psri CP) or XhxA (X). Poci lions or ihe size muckers arc shown on 
Hie 'left. Thfc "P-radiblubellcd protua used (S5 nnd SS, sec Ma ten alb 
and methods) covered the concensus region of genes MSifS and 
M5//rf, respectively 

the MSH6 family; since it was the first AtMSH6 to be 
released in the databases we have named it AtMSH6-J in 
this study, lis complete genomic sequence includes 19 
introns, of which only two coincide with inirons in ihe 
AtMSH6-2 sequence (data not shown). 



Genetic mapping 

The chromosomal positions of the genes AtMSH2 t At- 
MSH6-2 and AtMSH3 were determined. In the case of 



Amino acid sequence comparisons 

Initially, the sequence alignment was restricted to the 
conserved region which comprises the four NTP-bin ding 
domains in the C terminal region of the proteins 
(roughly 250 amino acids, see Fig. 1). As a general ob- 
servation, the deduced sequences of the different AtM&h 
proteins are more similar to their human counterparts 
than to the yeast homologues. In the conserved region, 
AtMsh2 is 71% identical to thu human Msh2 protein, 
AtMsh3 is 59% identical 10 the human Msh3 and At- 
Msh6-1 and AlMsh6-2 are, respectively. 55% and 54% 
identical to the human Msh6 (see Table 1). While these 
levels of identity are the highest observed, the Arabi- 
dopsin consensus sequences also resemble their respective 
orthologues (i.e. members of the same MulS family in 
other species) more than any paralogous Msh family 
members (i.e. members of other MutS families in the 
same species). The two Arahidopsis Msh6 amino acid 
sequences differ from each other, but sai" resemble each 



Tflblut Percen t*^ identity botwesn MihS, Msh3 nnd Mih6 ^equenccf. from humans, yctm and Afflbjdopsis 
Pralein Specie" Percent ktertlity h __ 



Msh! 

Matt 



Sc 

At 

Hs 

Sc 

At 

Hs 

Sc 

AH 

A l-l 



73 (41) 
71 (40) 
47 (23) 

43 (23) 
44(22) 
46 (22) 

44 (20) 
45(21) 
46 (20) 



69 (36) 
47 (21) 
42 (23) 
46 (22) 
45 (21) 

42 (20) 

43 (21) 

44 (22) 
Se 

Msh2 



45 (21) 

40 (23) 
50 (23) 
47 (23) 

41 (21) 
43 (21) 

46 (21) 
At 



52 (31) 
59 (36) 
46 (24) 
41 (24) 
50 (25) 
45 (24) 
Hs 



S0(2S) 
44 (23) 
38 (24) 
40 (23) 
3B (17) 
Sc 

Msh3 



46 (22) 

42 (24) 
44 (23) 

43 (19) 
Ai 



51 (20) 
55 (31) 
54 (29) 



4a (29) 

49 (2B) 
Sc 

Mali 6 



5tt (29) 
At-t 



At-2 



''ft&SS Sffittri^ffiiff revues a8 bribed in Fig. I . Vah.cs in parent wcr C calculu-d for .he m*n#* 

proteins aligned using CLUSTA^W 
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oiber (5H% identity) more than the human or yeast 
Msh6. The three proteins described in this study are 
clearly not members of the Mshl or Msh4 and MsliS 
families, the mitochondrial avid meiotic MutS homo- 
logues, respectively, Comparison of the full-length pro- 
Leins corroborates these observations, although the 
alignment with CLUSTALW over their complete se- 
quences should be considered less accurate since no 
further manual refinement was done. This analysis 
shows that AtMsh2 is 40% identical (61 % similar) to the 
entire human NUU2 protein; AtMsh3 is 36% identical 
(54% similar) to the human Meh3 and AtMsh6-2 is 29% 
identical (44% similar) to its human counterpart (human 
Msh6 and AtMsh6-l are 31% identical and 48% similar 
to each other). In all instances, the levels of identity and/ 
or similarity mentioned above are higher than for any 
other combination of compared proteins presented in 
Table 1. Furthermore, along the aligned protein se- 
quences, some amino acid motifs at positions are found 
to be specifically conserved within each of the three Msh 
families and the Arahtdopsis proteins also present these 
specific patterns (data not shown), 

A phylogenetic study was performed using 197 un- 
ambiguously aligned amino acids within or around the 
consensus region at the C- terminal end (as in Fig. V). All 
available M$h2> 3 and 6 sequences were analysed using a 
maximum likelihood (ML) method which takes into 
account the differential rate of amino acid substitutions 
among different sites in a protein (through the use of a 
gamma law). The pattern of the tree confirms the clas- 
sification established based on degree of identity 
(Fig. 4). The three groups consisting oT Mshl. Msh3 or 
Msh6 homologucs are distinctly defined, and the Arab- 
idopjiis Msh sequences we isolated are firmly assigned to 
their respective groups- The evolutionary rate is lowest 
for Msh2, and highest for Msh3 and Msh6 proteins. In 
all three Msh subgroups, homologues from planus and 
animals (except for the Drowphila Msh2) tend to group 
together and are separated from Msh proiems from 
fungi. The occurrence of two intraspecies Msh6 homo- 



logucs that evolve independently may be restricted lo 
Arubtdapxix (or plants), since divergence or these two 
genes seems to have occurred after the divergence of 
plants and animals. 

Conservation of a few iniron positions between the 
two AtMSH6 genes reinforces this observation: if only 
strictly aligned intron positions are taken into account, 
two introns arc found at exactly the same positions in 
both AIMSH6 genes (introns 5 and 14 of AtMSH6~2I 
imrons 7 and 15 of AtMSH6-l)\ one of these sites also 
harbours an intron in the AtMSJ/3 gene (intron 14 of 
AtMSH6-2fmivon 15 of >t/A/SA/6-//mtron 10 of At- 
MSH3). None of these intron positions is shared by 
AtMSH2 or any other MSH6 (data not shown). 



Expression studies 

Expression of the different At MSN genes was assessed 
by Northern analysis performed with poly(A)'" RNA 
(see Fig. 5). The size and the low expression level of 
these genes made h necessary to use poly(A) RNA, fis 
their transcripts migrate in the same region ns the 2BS 
RNA, which makes the signal too diffuse to be detect- 
able with confidence by autoradiography. Since the At- 
MSH genes are very poorly expressed in plant tissues 
(data not shown), we took advantage of an A. thaliana 
cell suspension (Axelos ex al. 1992). This cell suspension 
is mitotically active: the cells grow exponentially for the 
first 5 days following inoculation, before entering the 
stationary phase; then the number of growing cells, as 
measured by their ability to form protoplasts, starts to 
decrease (data not shown). Northern analysis identified 
mRNAs of approximately 3.4 kb for AtMSH2, 3.5 kb 
for A j MSH 3 and 3.7 kb for AtMSJJ6-2, in accordance 
with the sizes predicted from the isolated cDNAs 
(Fig. 5). On day 2, when the cells are in the early ex- 
ponential growth phase, the AtMSH6-2 transcript is 
expressed at a higher level than at day 8; the same is true 
for At MSH 2 and At MSH 3, albeit to a lesser extent. 



Fig, 4 phyloB c H etic nn«1ysis of 
ihc 197 aligned amino ucklh 
from Hie con^rved rcgioit of all 
uvailuhle MSH2, MSH3 and 
MSH6 sequences. In such a I reft 
the length of the horiv.omal 
branches is such tint the evo- 
lutionary distance between two 
proteins is proportional lo the 
loml lenfiih of the hori7«mnl 
branches that connoci then 
(vcrLical branch leitfittiR are 
arbitrary). Boornlmp vnlucn are 
shown ni the nodes and the side 
bur represent* 10% sequence 
divergence 
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MSH 6 



MSH2 



AtMSH 




Fin. 5 Northern amilysis or lhc MSil gene expression in Arvhidopsix 
suspension cultures. Aliquois (2 ug) of poly(A) .ftNA were Joy ded in 
taws I »nd 2; I ag of tolul RNA w<s load"! m bnc 3. RNA wo* 
extracted from the >fraW«*y»« cell suspcndioii cultures 2 days (Line 1) 
and B days (hmc* 2 and 3) after uubculturc. Northern IjybridiraliQn 
whs earned out with ihe 32 fMttbe||e4 5' rtgioilB of the different AfOT 
genes (ixcluding ilic census region), »nd With com P' e ^'/*£ £f ' 
and bom iranskilioiml donation fop^r t£F) cDNAs ;tnd 28& rRNA 
bequenecs. The bottom panel nIiowe the eihidium bromide waning 
ptitLern 

These data clip be con-elated with AiRADSl expression, 
which is higher on day 2 than day S, as expected 
(Doutriaux ei al 199S). The probes used covered the 5 
regions of the three genes, outside the consensus regions, 
and were chosen so that they would noL cross-hybridize 
wilh the different MSH genes. This is confirmed by the 
fact that a single band of the expected size was detected 
with each probe. 



Discussion 



On the basis of their relationship wiih MSH genes from 
other species, we have isolated three Arahidopsis homo- 
logues of the mutS gene, which is known to be essential 
for the repair ofDNA mismatches in E. coh. Taking atto 
account another MSH gene encountered during the 
course of the Arabidopxis genome sequencing project, 
four MS/1 genes ore now known in this plant species: 
AtMSH2 t At MSB 3 and two AtMSm (-J and -2). These 
four genes share sequence characteristics with all MS H 
family members. The lengths and molecular weights of 
the predicted proteins are similar to those of other MutS 
homologues. In the C-termmal region, two highly con- 
served motifs - the NTT-binding domnin and a Helix- 
Turn-Helix domain - are found. From sequence com- 
parisons, we clas-sify them as belonging to the Msh2. 
Msh3 and Msh6 families. At the genomic level, the three 



MSH genes we isolated nrc unique and are detected as 
single bands following digestion of genomic DNA with 
restriction enzymes that do not recognize any sites in the 
probe. Nevertheless, another MSH6 homologue exists in 
Arahidopsis MSH6-J was detected during the systematic 
sequencing of Avahidopsis chromosome IV (Johnson 
ei al, 1 997). This gene differs from AtMSH6-2 in terms of 
sequence, chromosomal location, and intron distribu- 
tion: furthermore, it is not delected with a probe that 
includes the conserved region of A lMSH6~2- 

Sequence comparisons between the conserved C-ter- 
minat regions of the Msh proteins, as well as their 
complete sequences, clearly allow us to designate the 
four AtMSH genes of Arabidopxis as belonging to the 
MSH2, MSH6 or MSH3 family. Such comparisons also 
led to other [mporiant observations. U appears that the 
levels of identity are much higher among the Msh2 or- 
thologues than is the case among the Msh3 or Msh6 
or tho logues. It i* commonly believed that the more in- 
teractions a protein is involved in, the lower its rate of 
evolution (Dickerson 1971) As a functional mismatch 
repair system relies on Msh2 binding either to Msh3 or 
Msh6. followed by an interaction with the Mlhi/Pmisl 
complex, Msh2 interacts with at least three proteins 
while Msh3 and Msh6 only bind to Msh2 (Piolla ct ah 
1994- Acharya et al. 1996). Two-hybrid experiments 
have also identified PCNA, Exol and components of the 
nucleotide excision repair pathway as partners of the 
yeast or human Msh2 (Umar et al. 1996; TishkofT et al. 
1997; Bertrand et al. 1998; Gu el al. 1998). Such an 
experimental approach has not heen reported for M*h3 
and Msh6 and therefore we cannot exclude the possi- 
bility that other proteins interact with these gene prod- 
ucts. However, all indications suggest that Msh2 lies at 
the center of a complex protein network; this might 
therefore more severely restrict the sequence fluctuations 
permissible for this protein. 

A phylogenclic study including all available eukary- 
olic Msh (2, 3 and 6) sequences confirmed the previous 
assignment of each of the four putative Msh proteins 
from Arabidopsiji lo the three different Msh families. The 
analysis of the phylogcny of the Msh protein* is com- 
plicated by the heterogeneity of evolutionary rates, 
which can lead to artefacts in tree construction - all the 
more so when the number of nucleotides used is low. 
The difficulty in reconstructing Msh phylogeny is illus- 
trated by the fact that the monophyly of fungal Msh6 
sequences is only recovered when among-sitc rate variation 
is taken into account (data not shown). Variation in 
evolutionary rates is not only observed between peira- 
logues but also between species within each group of 
paralogues (see for example the branch lengths for 
fungal sequences in Fig. 4). The most likely artefact ts 
the long branch attraction phenomenon (Felsenslcin 
197B), which generally results in the incorrect early 
emergence of fast-evolving sequences (Philippe and 
Laurent 199S). For instance, the Msh2 sequence of 
Drowphila, which emerges at the base of this group - far 
from other Metazoa - is very likely to be misplaced 
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because of this phenomenon. Similarly, although fungi 
are known Co be the closest relatives or animals (Baldauf 
and Palmer 1993). phylogenies based on Msh have al- 
ways found that animals arc closely related to plants and 
not to fungi, which could be due to a higher raie of 
evolution of MMR proteins in fungi. Interestingly the 
same observation, i.e. an increased evolutionary rate in 
fungi and Drowphila* has previously been reported m a 
phylogcnetic analysis of Rad51 {Yeager Sussen ci al. 
1997). Finally, the finding thai two imron positions are 
common to both Arabkhpsis MSH6 genes and that one 
of these is aUo coincident with an intron position in 
AiMSN3 may argue in fnvcmr of the relatedness of these 
two families. 

The occurrence within the same species of two Mslirj 
homologu.es that evolvts independently ts unexpected, 
nnd may be restricted to Arabidopsis (or plants) since 
divergence of these two genes occurred after the sepa- 
ration of plants and animnls. Nevertheless, one may 
speculate about the situation in other eulcaryoliss- As its 
genome has been completely sequenced, there is no 
doubt about the uniqueness of MSH6 in S. ccrevwtoe. 
Although a single MSH6 gene has been described for 
human and mouse, this cannot be considered definitive 
as long as the human genome has not been totally se- 
quenced. Despite the considerable effort that has been 
devoted io the identification of all human MSH genes, 
MSH4 and MSHJ>, the meiotic mutS homologies, were 
discovered only recently (Paquis-Flucktinger et al. 1997; 
Her and Doggett 1998). Except for the particular case of 
Sarcophyton gluucum (Pont-K-ingdon et til. 1998), no 
MSH J gene has yet been found in any higher eukaryole, 
bul. in the view of the rate and spectrum of mitochon- 
drial mutations in human cells, its existence remains 
questionable (Khrapo et al. 1997). in fact, the single- 
copy nature of the MSH6 genes is strongly supported by 
the persistence of n specific and similar phenotype when 
Mshfi is defective in either S. cerevisiae or mammals 
(Marsichsky et al. 1996; Edclmann et al. 1997). Phylo- 
genetic analysis also favour the idea that only plants 
have two Msh6 proteins that evolve independently. In 
the absence of expression or functional studies, it is clear 
that we cannot yet conclude that an AtMsrio-l protein is 
in faci active in Arahidopsis. However, the phylogenetic 
analysis suggests that AiMsh6-l is a functional prolcin, 
otherwise it would be expected to have picked up mu- 
Uuions in the known functional domains. Wheiher 
Msh6-2 and AtMsh6-l coexist in the same tissues, are 
functionally redundant or have acquired different spe- 
cialized functions will have to be assessed in the future. 

The three MSH genes we describe are expressed in 
Arabidopsis: cDNA clones were successfully obtained 
and m RN As specific to each gene were delected in 
Northern blot experiments. The AtMSHZ, At MSH 3 and 
AtMSH6-2 transcripts differ in size and their estimated 
sizes correlate with the lengths of the cDNA sequences. 
All three genes are expressed in a cell suspension derived 
from A. thaliana - at slightly higher levels in the expo- 
nential growlh phase than in the stationary phase. We 



247 

also find a much higher level oCAtRADSJ transcripts in 
the cells on day 2 than on day 8; AiRAQSl has been 
shown previously to be regulated in S-phase (Doutnaux 
et al. 1996)* Precise assessment of the phase of induction 
in the cell cycle would require cell synchronization. In 
S cerevisifit!, induction in early S-phase has been de- 
scribed for the MSH2 and MSH6 genes, while MSH3 
transcription was found to be constitutive during the cell 
cycle (Kramer et al. 1996). In E, va\U MutS was also 
found to be depleted in stationary-phase cultures (Feng 
et al. 1996). Overall, these data support Ibe idea that 
MSH genes are expressed at a time when cells are di- 
viding actively nnd thus replicating their DNA, 
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